Adaptive Layout Analysis of Document Images

نویسندگان

  • Donato Malerba
  • Floriana Esposito
  • Oronzo Altamura
چکیده

Layout analysis is the process of extracting a hierarchical structure describing the layout of a page. In the document processing system WISDOM++ the layout analysis is performed in two steps: firstly, the global analysis determines possible areas containing paragraphs, sections, columns, figures and tables, and secondly, the local analysis groups together blocks that possibly fall within the same area. The result of the local analysis process strongly depends on the quality of the results of the first step. In this paper we investigate the possibility of supporting the user during the correction of the results of the global analysis. This is done by allowing the user to correct the results of the global analysis and then by learning rules for layout correction from the sequence of user actions. Experimental results on a set of multi-page documents are reported. 1 Background and motivations Processing document images, that is bitmaps of scanned paper documents, is a complex task involving many activities, such as preprocessing, segmentation, layout analysis, classification, understanding and text extraction [6]. Those activities are all important, although, the extraction of the right layout structure is deemed the most critical. Layout analysis is the perceptual organization process that aims at detecting structures among blocks extracted by the segmentation algorithm. The result is a hierarchy of abstract representations of the document image, called the layout structure of the document. The leaves of the layout tree (lowest level of the abstraction hierarchy) are the blocks, while the root represents the set of pages of the whole document. A page may include several layout components, called frames, which are rectangular areas corresponding to groups of blocks. Strategies for the extraction of layout analysis have been traditionally classified as top-down or bottom-up [10]. In top-down methods, the document image is repeatedly decomposed into smaller and smaller components, while in bottom-up methods, basic layout components are extracted from bitmaps and then grouped together into larger blocks on the basis of their characteristics. In WISDOM++ (www.di.uniba.it/~malerba/wisdom++/), a document image analysis system that can transform paper documents into either HTML or XML format [1], the applied page decomposition method is hybrid, since it combines a top-down approach to segment the document image, and a bottom-up layout analysis method to assemble basic blocks into frames. Some attempts of learning the layout structure from a set of training examples have also been reported in the literature [2,3,4,8,11]. They are based on ad-hoc learning algorithms, which learns particular data structures, such as geometric trees and tree grammars. Results are promising although it has been proven that good layout structures could also be obtained by exploiting generic knowledge on typographic conventions [5]. This is the case of WISDOM++, which analyzes the layout in two steps: 1. A global analysis of the document image, in order to determine possible areas containing paragraphs, sections, columns, figures and tables. This step is based on an iterative process, in which the vertical and horizontal histograms of text blocks are alternately analyzed, in order to detect columns and sections/paragraphs, respectively. 2. A local analysis of the document to group together blocks that possibly fall within the same area. Generic knowledge on west-style typesetting conventions is exploited to group blocks together, such as “the first line of a paragraph can be indented” and “in a justified text, the last line of a paragraph can be shorter than the previous one”. Experimental results proved the effectiveness of this knowledge-based approach on images of the first page of papers published in either conference proceedings or journals [1]. However, performance degenerates when the system is tested on intermediate pages of multi-page articles, where the structure is much more variable, due to the presence of formulae, images, and drawings that can stretch over more than one column, or are quite close. The main source of the errors made by the layout analysis module was in the global analysis step, while the local analysis step performed satisfactorily when the result of the global analysis was correct. In this paper, we investigate the possibility of supporting the user during the correction of the results of the global analysis. This is done by means of two new system facilities: 1. the user can correct the results of the layout analysis by either grouping or splitting columns/sections, automatically produced by the global analysis; 2. the user can ask the system to learn grouping/splitting rules from his/her sequence of actions correcting the results of the layout analysis. The proposed approach is different from those that learn the layout structure from scratch, since we try to correct the result of a global analysis returned by a bottom-up algorithm. Furthermore, we intend to capture knowledge on correcting actions performed by the user of the document image processing system. Other document processing systems allow users to correct the result of the layout analysis; nevertheless WISDOM++ is the only one that tries to learn correcting actions from user interaction with the system. In the following section, a description of the layout correction operations is reported, and the automated generation of training examples is explained. Section 3 briefly introduces the learning system used to generate layout correction rules and presents some preliminary experimental results. 2 Correcting the results of the global analysis Global analysis aims at determining the general layout structure of a page and operates on a tree-based representation of nested columns and sections. The levels of columns and sections are alternated, which means that a column contains sections, while a section contains columns. At the end of the global analysis, the user can only see the sections and columns that have been considered atomic, that is, not subject to further decomposition (Figure 1). The user can correct this result by means of three different operations: Horizontal splitting: a column/section is cut horizontally. Vertical splitting: a column/section is cut vertically. Grouping: two sections/columns are merged together. The cut point in the two splitting operations is automatically determined by computing either the horizontal or the vertical histogram on the basic blocks returned by the segmentation algorithm. The horizontal (vertical) cut point corresponds to the largest gap between two consecutive bins in the horizontal (vertical) histogram. Therefore, splitting operations can be described by means of a binary function, namely, split(X,S), where X represents the column/section to be split, S is an ordinal number representing the step of the correction process and the range of the split function is the set {horizontal, vertical, no_split}. The grouping operation, which can be described by means of a ternary predicate group(A,B,S), is applicable to two sections (columns) A and B and returns a new section (column) C, whose boundary is determined as follows. Let (leftX, topX) and (bottomX, rightX) be the coordinates of the top-left and bottom-right vertices of a Fig. 1. Results of the global analysis process: one column (left) includes two sections (right). The result of the local analysis process (i.e., the frames) is in reported the background. column/section X, respectively.1 Then: leftC= min(leftA, leftB), rightC=max(rightA,rightB), topC=min(topA,topB), bottomC=max(bottomA,bottomB). Grouping is possible only if the following two conditions are satisfied: 1. C does not overlap another section (column) in the document. 2. A and B are nested in the same column (section). After each splitting/grouping operation, WISDOM++ recomputes the result of the local analysis process, so that the user can immediately perceive the final effect of the requested corrections and can decide whether to confirm the correction or not. From the user interaction, WISDOM++ implicitly generates some training observations describing when and how the user intended to correct the result of the global analysis. These training observations are used to learn correction rules of the result of the global analysis, as explained below. 3 Learning rules for layout correction The inductive learning problem to be solved concerns the concepts split(X,S)=horizontal, split(X,S)=vertical and group(X,Y,S)=true, since we are interested to find rules predicting both when to split horizontally/vertically a column/section and when to group two columns/sections. No rule is generated for the case split(X,S)=no_split and group(X,Y,S)=false. The definition of a suitable representation language for the global layout structure is a key issue. In this work, we restrict this representation to the lowest column and section levels in the tree structure extracted by the global analysis and we deliberately ignore other levels as well as their composition hierarchy. Nevertheless, describing this portion of the layout structure is not straightforward, since the columns and sections are spatially related and the feature-vector representation typically adopted in statistical approaches cannot render these relations. In this work the application of a first-order logic language has been explored. In this language, unary function symbols, called attributes, are used to describe properties of a single layout component (e.g., height and width), while binary predicate and function symbols, called relations, are used to express spatial relationships among layout components (e.g., part_of and on_top). An example of a training observation automatically generated by WISDOM++ follows: split(c1,s)=horizontal, group(s1,s2,s)=false, split(s1,s)=no_split, split(s2,s)=no_split

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Logical Labeling of Document Images Using Layout Graph Matching with Adaptive Learning

Logical structure analysis of document images is an important problem in document image understanding. In this paper, we propose a graph matching approach to label logical components on a document page. Our system is able to learn a model for a document class, use this model to label document images through graph matching, and adaptively improve the model with error feed back. We tested our met...

متن کامل

Layout Based Information Retrieval from Document Images

This research is intended to develop a layout based retrieval system for document image databases consisting of three phases: 1. At first, intelligent layout analysis algorithm has been designed to extract the layouts the document images physically with their edges and rectangles. 2. Every physically identified layout has been converted into a tree intermediary representation for indexing and s...

متن کامل

Distributed Autonomous Agents for Chines Document Images Segmentation

In Chinese document image processing, text and/or graphical block detection serves as an essential step in document layout analysis that in turn permits the eeective reasoning about the logical relationships among various text paragraphs and graphical entities for the purpose of document understanding. This paper presents a novel computational paradigm for extracting text/graphic blocks from Ch...

متن کامل

Layout Analysis for Camera-Based Whiteboard Notes

A domain where, even in the era of electronic document processing, handwriting is still widely used is note-taking on a whiteboard. Such documents are either captured by a pen-tracking device or – which is much more challenging – by a camera. In both cases the layout analysis of realistic whiteboard notes is an open research problem. In this paper we propose a camera-based three-stage approach ...

متن کامل

Parameter-Free Geometric Document Layout Analysis

ÐAutomatic transformation of paper documents into electronic documents requires geometric document layout analysis at the first stage. However, variations in character font sizes, text line spacing, and document layout structures have made it difficult to design a general-purpose document layout analysis algorithm for many years. The use of some parameters has therefore been unavoidable in prev...

متن کامل

Adaptive Document Layout via Manifold Content

We present and explore a simple idea for improving document layout on arbitrary devices of different resolutions and size. The key idea is to allow manifold representations of content: multiple versions of anything that might appear in a document, such as text, images, or even stylistic conventions. Content is then selected and formatted dynamically, on the fly, by a layout engine in order to b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002